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ASYMPTOTIC PROPERTIES OF THE MAXIMUM 
LIKELIHOOD ESTIMATION IN MISSPECIFIED HIDDEN 

MARKOV MODELS 

By Randal Douc ^ and Eric Moulines EE 

Let {Yk)kez be a stationary sequence on a probability space 
{Q, A, P) taking values in a standard Borel space Y. Consider the asso- 
ciated maximum likelihood estimator with respect to a parametrized 
family of Hidden Markov models such that the law of the observations 
(Yfe)fegz is not assumed to be described by any of the Hidden Markov 
models of this family. In this paper we investigate the consistency of 



fT^ ■ this estimator in such mispecified models under mild assumptions. 



1. Introduction. An assumption underlying most of the classical the- 
ory of maximum likelihood is that the " true" distribution of the observations 
is known to lie within a specified parametric family of distributions. In many 
settings, it is doubtful that this assumption is satisfied. It is therefore nat- 
ural to investigate the convergence of the maximum likelihood estimator 
(MLE) and to identify the possible limit for misspecified models. Such ques- 
tions have been mainly investigated for models in which observations are 



\^ • independent; see |15l ]. |28l |. Much less is known on the behavior of the MLE 

. estimate for dependent observations; see [l3| and the references therein. 



For independent observations, under mild additional technical conditions, 
the MLE converges to the parameter which minimizes the relative entropy 



rate; see [15|]. The purpose of this paper is to show that such a result remains 
true when the observations are from an ergodic process and for classes of 
parametric distributions associated to an Hidden Markov Models (HMM). 
A HMM is a bivariate stochastic process {Xk,Yk)k>o, where {Xk)k>Q is a 
^ , Markov chain (often referred to as the state sequence) in a state space X 

^ I and, conditionally on (Xk)k>o, (^fe)fc>o is a sequence of independent random 

variables in a state space Y such that the conditional distribution of given 
the state sequence depends on only. The key feature of HMM is that the 
state sequence {Xi:)k>o is not observable, so that statistical inference has 
to be carried out by means of the observations (lfc)fc>o only. Such problems 



*SAMOVAR, CNRS UMR 5157 - Institut Telecom/Telecom SudParis, 9 rue Charles 
Fourier, 91000 Evry 

^LTCI, CNRS UMR 8151 - Institut Telecom /Telecom ParisTech, 46, rue Barrault, 
75634 Paris Cedex 13, France 

''This work is supported by the Agence Nationale de la Recherche through the 2009- 
2012 project Big MC 

1 

imsart-aos ver. 2007/12/10 file: soumission.tex date: October 4, 2011 



2 



R. DOUG AND E. MOULINES 



are far from straightforward due to the fact that the observation process 
(^fc)fe>o is generally a dependent, non-Markovian time series [despite that 
the bivariate process {Xk,Yk)k>o is itself a Markov chain]. 

HMM have been intensively used in many scientific disciplines including 
16l . 2^ . biology [H], engineering 18], neurophysiology U] and 



econometrics 



the statistical inference is therefore of significant practical importance 
In all these applications, misspecified models are the rule, so it is worthwhile 
to understand the behavior of MLE under such regime. 



This work extends previous results in this direction obtained by |24i |. but 
which are restricted to discrete state-space Markov chains. Our main result 
of consistency of the MLE in misspecified HMM model is derived under 
assumptions which are quite weak, covering general state-space HMM under 
conditions which are much weaker than 0] , where a strong mixing conditions 
was imposed on the transition kernels of the hidden chain. Therefore our 
results can be applied to many models of practical interest, including the 
Gaussian Linear State Space Model, the discrete state-space HMM and more 
general nonlinear state-space models. 

The paper is organized as follows. In Section [21 we first introduce the set- 
ting and notations that are used throughout the paper. In Section[3l we state 
our main assumptions and results. In Section our main result is used to es- 
tablish consistency in three general classes of models: linear-Gaussian state 
space models, finite state models, and nonlinear state space models of the 
vector ARCH type (this includes the stochastic volatility model and many 
other models of interest in time series analysis and financial econometrics). 
Section [5] is devoted to the proof of our main result. 

Notations. Some notations pertaining to transition kernels are required. 
Let L be a (possibly unnormalized) transition kernel on (X, X) i.e. for any 
X E X, L(x, •) is a finite measure on (X, X) and for any A ^ X , x ^ L{x, A) 
is measurable function from (X, ^) to ([0, 1] ,^([0, 1])). L acts on bounded 
functions / on X and on fi-finite positive measures /i on (X, X) via 

Lf{x) = 6,Lf ^ I L{x,dy)f{y) , fiL{A) = i^LIa = J /x(dx)L(x,^) . 

If Li and L2 are two transition kernels on (X, X), then L1L2 is the transition 
kernel on (X, X), given, for any x € X and A £ X hy 



LiL2{x,A) = j Li{x,dy)L2{y,A) 



2. Problem statement. We consider a parameterized family of HMM 
with parameter space 0, assumed to be a compact metric space. For each 
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parameter € 0, the distribution of the HMM is specified by the transition 
kernel of the Markov chain (X/;)fc>o, and by the conditional distribu- 
tion of the observation Yfc given the hidden state X^, referred to as the 
likelihood of the observation. 

For any m < n and any sequence {afejfeeZ) denote aj^ = {am,---,an) 
and for any probability measure x on {X,X), define the likelihood of the 
observations by 



n 



X{dxm)g {xm,ym) [[ Q {Xp-i,dXp)g {Xp,yp) 
p=m+l 



Pxiy^)/Px^ym^) , m<p<n, 



with the standard convention np=jn cip = 1 if m > n. 

Let (J7, J^, P) be a probability space and let {Yk)k&z be a stationary er- 
godic stochastic process taking value in (Y,3^). We denote by Py the image 
probability of P by (Yfc)feez on the product space (Y^,3^®^), and Ey the 
associated expectation. We stress that the distribution Py may or may not 
belong to the parametric family of distributions specified by the transition 
kernels {{Q^ , g^),0 G 0}. If Py does not belong to Q, the model is said to 
be misspecified. 

If X is a probability measure (X,X), we define the Maximum Likelihood 
Estimator (MLE) associated to the initial distribution x by 



(1) 



= argmaxegelnp^,(y(f ^] 



The study of asymptotic properties of the MLE in HMM was initiated in 
the seminal work of Baum and Petrie [3, in the 1960s. In these papers, 
the model is assumed to be well specified and the state space X and the 
observation space Y were both presumed to be finite sets. More than two 
decades later, Leroux [i^] proved consistency for well-specified models in the 
case that X is a finite set and Y is a general state space. The consistency of 
the MLE in more general HMM has subsequently been investigated for well- 
specified models in a series of contributions 0, S, JJ, 20, 21] using different 
methods. A general consistency result for HMM has been developed in 0]. 

Though the consistency results above differ in the details of their proofs, 
all proofs have a common thread which serves also as the starting point for 
this paper. Denote by p^{Y^) the likelihood of the observations for the 
HMM with parameter 6 £ Q and initial distribution x- The first step of the 
proof aims to establish that for any 9 (z Q, there is a constant i{9) such that 



lim n-^logpl{Y^~ 



lim n E 



log/(V"') 



a.s. 
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Up to an additive constant, 6 ^ i{6) is the negated relative entropy rate 
between the distribution of the observations and p^(-) respectively. When 
the model is well-specified and 9 = Oi, \s the true value of the parameter, 
this convergence follows from the generalized Shannon-Breiman-McMillan 
theorem [H; for misspecified models or for well-specified models with 9 ^ Oi, 
the existence of the limit is far from obvious. 

The second step of the proof aims to prove that the maximizer of the 
likelihood 9 ^ logp^(y^") converges P-a.s. to the maximizer of 9 ^ ^(^)) 
that is, to the minimizer of the relative entropy rate. Together, these two 
steps show that the MLE is a natural estimator for the parameters which 
minimizes the relative entropy rate in the parametric family {{Q^ , g^),9 € 
G} 

Let us note that one could write the likelihood as 



^ n— 1 



n 

k=0 



where p^{Yk\YQ ^) denotes the conditional density of Yfc given Yq ^ under 
the misspecified model with parameter 9 (that is, the one-step predictive 
density). If the limit oi p^^iYklY^^'^) as k oo can be shown to 

exist P-a.s. and to be P-integrable, the convergence of the log-likelihood to 
the relative entropy rate follows from the Birkhoff ergodic theorem, since the 
process {yfcjfcez is assumed to be ergodic. This result provides and explicit 
representation of the relative entropy rate i{9) as the expectation of the 
limit £{9) = K [log 7r^(y°^)] . The hmit n^iY^oo) might be interpreted as 
the conditional likelihood of Y^ given the whole past Y^^, but we must 
refrain ourselves of considering this quantity as a conditional density. 

Such an approach was used in 0] for finite state-space and was later ex- 
tended by j^l to general state-space, but under stringent technical conditions 
(uniform mixing of the Markov kernel, which more or less restrict the valid- 
ity of the results to compact state-spaces, leaving aside important models, 
such as Linear Gaussian state-space models). 

Alternatively, the predictive distribution p^(Yk\YQ~^) can be expressed as 
a component of the state of a measure- valued Markov chain; in this approach, 
the existence of the limiting relative entropy rate £{9), follows from the 
ergodic theorem for Markov chains provided that this Markov chain can 
be shown to be ergodic. This approach was used in 

0, 0, IS and, later 



extended to misspecified models by [2J]. This technique is adequate for finite 



state-space Markov chain, but does not extend easily to general state-space 
Markov chains; see 
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In [22] , the existence of the relative entropy rate is estabhshed by means 
of Kingman's subadditive ergodic theorem (the same approach is used in- 
directly in [2^, which invokes the Furstenberg-Kesten theory of random 
matrix products). After some additional work, an explicit representation of 
the relative enropy rate is again obtained. However, as is noted in 22, p. 
136], the latter is surprisingly difficult, as Kingman's ergodic theorem does 
not directly yield a representation of the limit as an expectation. 

For completeness, we note that a recent attempt [Ij] to prove consiste ncy 
of the MLE for general HMM contains very serious problems in the proof 17 1 
(not addressed in [3]), and therefore fails to establish the claimed results. 

In this paper, we prove consistency of the MLE for general HMM in mis- 
specified models under quite general assumptions. Our proof follows broadly 
the original approach of [1, @] , but relax the very restrictive technical con- 
ditions used in these works and extend the analysis to misspecified models. 
The key technique to obtain this result is to establish the exponential forget- 
ting of the filtering distribution; this result is obtained by using an original 
coupling technique originally introduced in [l^ and refined in [y]. 



3. Assumptions and main results. For any integer t > 1, 6 € Q 
and any sequence G Y*, consider the unnormalized kernel L^(yQ~^) on 
(X, X) defined for all xq G X and A € by 



(2) 



L^(y*-i)(rro,A) = J ■ 



Y\_g^{xi,yi)Q^{xi,dx^ 



i+l) 



.i=0 



tA{xt) 



Note that, for any t > I, 9 e @, xq e X, and y^ ^ G Y*, 

(3) L'{yl-'){xo,X)=plM-')^ 

where for x G X, s < t, p^iyt), the likelihood of the observation y* starting 
from state shorthand notation for Pg^{yl)- 

Definition 1. Let r be an integer. A set C G X is a r-local Doe- 
blin set with respect to the family {Q^ , g^}0£e, if there exist positive func- 
tions : Y*" — 7> M"^, : Y^' — > M"*" and a family of probability measures 
{A^(2:)}ege,2ev CL^d of positive functions {^Q{z)}e^s^zeY^ such that for any 
9 e Q, z , A^(z)(C) = l and, for any A e X, and x G C, 

(4) e-{z) ^i{z){x) \i{zm < L^(z)(x, AnC) < e+(z) /c(^)(^) ^'ci^m ■ 

This implies that for any measurable nonnegative function / on {X,X), 
x G C and any z G Y**, 

e^{z) ^l{z){x) \'c{z){ld) < 4L^(^)(lc/) < e+iz) ^'c{z){x) \l{z){lcf) . 
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Note that when r = 1, then, for y G Y, \f{y){x,k) = g'^ {x,y)Q'^ {x, A). 
We require that the condition is satisfied for any € 0, but this is not a 
serious restriction since is assumed to be compact. 

Remark 1. To illustrate this condition, consider the case r = 1. Assume 
that for some set C, there exist positive constants e^, and a family of 
probability measures {A^j^ige such that for any G , A^(C) = 1 and, for 
any A G <-f , and x G C, 



Then, clearly L^(y)(x,A) = g^(x,y)Q^{x,A) satisfies (j4]) where €(- and Cq 
are positive constants. In this case C is a 1-local Doeblin set with respect to 



Remark 2. Local Doeblin sets share some similarities with 1-small set 
in the theory of Markov chains over general state spaces (see f2^ . chapter 
5]). Recall that a set C is 1-small for the kernel , 9 ^ Q if there exists 
a probability measure Xq and a constant ic > 0, such that A^(C) = 1, and 
for all X £ C and A £ X, Q^{x,A H C) > ecA^(A n C). In particular, a 
local Doeblin set is 1-small with ec = and A^ = A^. The main difference 
stems from the fact that we impose both a lower and an upper bound, and 
we impose that the minorizing and the majorizing measure are the same. 



(Al) There exist an integer r > 1 and a set K G such that 

(i) P [Y^-^ G K] > 2/3, 

(ii) For all r] > 0, there exists a r-local Doeblin set C G A:" such that for 
all ^ G and for all y^~^ G K, 



ec ^c(A) < Q {x, A n C) < e+A^(A) . 



Q ; see 



By. 



(5) 



sup p^^iyli ) < r/ sup p^^iyl^ ) < oo 



and 



(6) 




>0, 



(iii) 



where the functions and €(- are defined in Definition [H 
There exists a set D such that 



(7) 



E In" inf inf L^{Yf^~^){x, D) < oo . 
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(A2) (i) For any 9 e Q, the function : {x,y) G X x Y i-^ g^{x,y) is 
positive, 

(ii) E [ln+supeg0sup^.gxff^(2;,^o)] < oo. 

(A3) There exists p G N such that for any x € X and 7i > p, P-a.s. the function 
6 I— Px{Yo') is continuous on B. 

Remark 3. (^-42^ formalizes the intuition that a block of consecutive ob- 
servations yQ~^ provides an information on the state x, provided that yQ~^ 
belong to a "nice" set K. It is formulated here by controlling the concen- 
tration of the state density Xq conditional to the block of observations yQ~^ 
outside local Doeblin sets. This condition is reminiscent of the controllability 
condition for linear state space models. 

Remark 4. ('tII^ assumes that the conditional likelihood g^ is positive. 
The case where g^ can vanish typically requires different conditions (see f^], 



'22j). The second condition can be read as a generalized moment condition 



onYQ. It is satisfied in many examples of interest. 

Remark 5. To check i^.^4f7]]- (jiiil) one may for example check that 

(i) inf^.gD infeee Q^{x, D) > 0, 

(ii) E [ln"infeeeinfxeD/(2;,lb)] < oo. 

This condition is satisfied if {x,9) i— )■ g^{x,y) is continuous and D is a 
compact small set for all 9 ^ Q, there exists a probability measure such 
that = 1 and a constant 5 > 0, such that, for all x € D and A ^ X , 

Q^{x, A) > 6u^{A). Note however that (Ml])- ^^ is far weaker than imposing 
that the set D is 1-small. This is important to deal with examples for which 
the transition kernel Q^{x,-) does not admit a density with respect to to 
some fixed dominating measure; see for example Section \4.1\ 



Remark 6. (J^ is in general the consequence of the continuity of the 
kernel 9 i— )■ Q^{x,-) and of the function 9 i— > g^{x, ), using classical tech- 
niques to deal with integrals depending on a parameter. 

Remark 7. According to ([3|), the bound ([5]) may also be rewritten in 
terms of the kernel {y^ 



as: 



sup L^(y5-^)(xo,X) < 7? sup {y^^~'){xo,X) < oo , 

The convergence of the relative entropy is achieved for initial distributions 
belonging to a particular class of initial probability distributions. For the 
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integer r and the set D € defined in (A[T]), let A4{D,r) be the subset 
7'(X, X) of probability measures on (X, X) satisfying 



(8) M{D,r) = ^x<^r{X,X) 
E 



In- inf xL^(V"')1d 



< cxD , for all € {1, . . . , r} 



Proposition 1. Assume (J^^. Then, 

(i) For any 9 G @, there exist a measurable function -Ky '■ 
that for any probability measure x G M.{D,r), 



such 



lim ptiYo\Y-^) = 4(5^° 



Moreover, 
(9) 



E 



lnvr^(y°oo)l 



< oo . 



(a) For any € and any probability measure x S -^(D,r), 
lim n"^ InpliYn^'^) = £(9), F - a.s. 

where £{9) = E [ln7r^(yi'^)] . 

Theorem 2. Assume (A[l\[^). Then, for any probability measure x ^ 
M{D,r), 

lim dig^n^Q") = 0, F-a.s. 

n— i-cxD 

where @* C Q is defined by 0* = argmaxgg@£(0) . 

A simple sufficient condition can be proposed to ensure that x ^ A4{D,r). 

Proposition 3. Assume there exist a sequence of sets D„ S n € 
{0, . . . ,r — 1}, such that (setting D,. = D for notational convenience), for 
some 5 > 0, 



(10) 
and 
(11) 



inf inf Q (x„_i, D„) > (5 , li € {1, . . . ,r} 



E 



In inf inf q (x,Yo) 



< oo , for u E {0, . . . , r} 



Then, any initial distribution x on (X, A') satisfying xi^o) > belong to 
M{D,r). 
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Remark 8. To check we typically assume that, for any given y € 

Y, the function {x,6) i— t- g^{x,y) is continuous and that Di x @ is a compact 
set, i € {0, . . . ,r — 1}. This condition then translate into an assumption on 
some generalized moments of the process Y . 

To check (jlOp . the fohowing Lemma is useful. 

Lemma 4. Assume that X = M'^ for some integer d > and that X is 
the associated Borel a-field. Assume in addition that, for any open subset 
G A", the function {x,9) — > Q^{x,0) is lower semi- continuous on the 
product space X x O. Then, for any 5 > and any compact subset Dq € X , 
there exists a sequence of compact subsets D„, u € {0, . . . , r — 1} satisfying 



4. Applications. In this section, we develop three classes of examples. 
In section IITT] we consider linear Gaussian state space models. This is obvi- 
ously a very important model, which is used routinely to analyze time-series 
models. We analyze this model under assumptions which are very general 
and might serve to illustrate the stated assumptions. In section IT2l we con- 
sider the classic case where state space of the underlying Markov chain is 
a finite set. Finally, in section 14.31 we develop a general class of nonlinear 
state space models. In all these examples, we will find that the assumptions 
of Theorem [2] are satisfied under general assumptions. 

4.1. Gaussian linear state space models. Gaussian linear state space mod- 
els form an important class of HMM. In this setting, let X = M'^^ and Y = W'-y 
for some integers and let be a compact parameter space. The model is 
specified by 



where {{Uk,Vk)}k>o is an i.i.d. sequence of Gaussian vectors with zero 
mean and identity covariance matrix, independent of Xq. Here C/fc is du- 
dimensional, Vk is dj^-dimensional, and the matrices Aq, Rq, Bg, Se have the 
appropriate dimensions. 

For any integer n, define Og^n and Cg^n the observability matrix and the 




Yk 



AgXk + RgUk 
BgXk + SgVk 



(13) 
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controlability matrices 



(14) 



Be 
BeAe 
BeAl 



and Ce,n = [A'^^Re A^^Re ...Re\ . 



B0A^e~\ 

It is assumed in the sequel that for any E 0, the following hold: 

(LI) The pair [Agji^g] is observable and the pair [Aq^Rq] is controllable, 
that is there exists an integer r such that, the observability matrix 
Oe^r and the controllability matrix Ce^r are full rank. 

(L2) The measurement noise covariance matrix Sg is full rank. 

(L3) The functions 9 ^ Ag, 9 ^ Rq, ^ Bq and 6 ^ Sg are continuous 
on G. 

(L4) E[||yof] <oo. 

We now check the assumptions of Theorem [2j 

The dimension du of the state noise vector C/^ is in many situations smaller 
than the dimension dx of the state vector and hence Rg^Rg may be rank 
deficient. 

Some additional notations are needed. For any positive matrix A and any 
vector z of appropriate dimension, denote ||2||^ = ^zA~^z. Define for any 
integer n 

(15) Tg^n = T^e^nT^e^n + Sg^nSg^n , 

where * denotes the transpose and 


BgRg 
BgAgRg BgRg 








BgA^.-^Rg BgA^'R 



n— 3 ; 





BgRg 



<Sg,n 



Sg 
Sg 

• • • 



■•• 

Sg 



Under (Ll2]), for any n > r, the matrix J-g^n is positive definite. The likelihood 
of the observations ?/o~^ ^ starting from xq is given by 



(16) piM~') = (27r)-"'^Met-i/2(7-,,„)exp 



^2 iiy^-i 
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where y^-i = *[*yo, *yi, • • • , ^Un^i] and Og^n is defined in ([T4]). 

Consider first (A[T|). Under (L[l|), tlie observability matrix Og^r is full rank, 
we have, for any compact subset K C Y*^', 



lim 



inf 

. 1 — 1, 



|yr-l - C>e,rXo\\jr^^ = OO 



||xo||^ooy^--^gK 

showing that, for all rj > 0, we may choose a compact set C in such a 
way that ([5]) is satisfied. It remains to prove that any compact set C is 
a r-local Doeblin satisfying the condition ([6]). For any yg"^ € Y^'~^ and 
€ X the measure L^(?/q~^)(xo, •) is absolutely continuous with respect to 
the Lebesgue measure on X with Radon-Nikodym denoted {yQ~^){xQ,Xr) 
given (up to an irrelevant multiplicative factor) by 

2 

Xo 



(17) f{yl,-^){xo,Xr) oc det-^/\gg,r.)ew y- 
where the covariance matrix Qg ^ is given by: 



Yr-l 



Og^r 

Al 



Vg,r 

Ce,r 



The proof of (|17p relies on the positivity of Gg^r^ which requires further 
discussion. By construction, the matrix Gg^r is non-negative. For any yr-i € 
and x € X, the equation 



'x]gg,r 



X 



l^Vgr-Yr-l + ^Cgrxl 



\t c I H n 

I '-'6»,ryr-l|| — U 

|2 



implies that IpD^^ryr-i + ^C^^ra^H = and ||*56i^ryr-i|P = 0. Since the 
matrix Sg^r is full rank, this implies that yr~i = 0. Since Cg^r is full-rank 
(the pair [Ag,Rg] is commandable) , this implies that x = 0. Therefore, the 
matrix Gg r is positive definite and, for any yr-i, the function 



{xo,Xr) ^ 



is continuous, and is therefore bounded on any compact subset of X x X. 
This implies that, for every non-empty compact set C C M'^^ is a r-local 
Doeblin set, with A^(-) = AL«^(-)/AL^^(C) and 

-1 





yr-i 




'Og,r' 












Xq 



SG© (x(),a:,.)GCxC 



e^(yr') = (a^^"(C))"'sup sup t\yl-^){xo,Xr) . 

^ ^ 6»ee (a-o,a:r)GCxC 
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Therefore, condition ([6]) is satisfied with any compact set K C Y^~^. It 
remains to show (A[T])- (jiii|) . Under (L[l|), Ij^ {yQ~^){xo, ■) is absolutely con- 
tinuous with respect to the Lebesgue measure A^'^'^. Therefore, for any set 
D, 



inf inf L'^(y5-^)(xo,D) > inf inf f {yl,-'){xo,Xr)X^'^(D) 
Take D to be any compact set with positive Lebesgue measure. 



sup sup 

060 (a:o,Xr)eDxD 





Yr-l 
















Xq 



< 2K 



|||y^_if + max||xf [l + A^ax (*Ce,rCe,r + ^e)] | , 



where Amax(^) is the largest eigenvalue of A. Under (Ll3]), 9 i-)- Amax {Ge,r) 
and e ^ Amax {^Oe,rOe,r + A^) are bounded. Under (Ij4]), E [||yof ] < 
oo, then (AfT]l- (liii]l is satisfied for any compact set. 

Consider now (Al2]). Under (1(2]), Sq is full rank, and choosing the reference 
measure ^ to be the Lebesgue measure on Y, we find that g^{x,y) is a 
Gaussian density for each x gX with covariance matrix Sg^Sg. We therefore 
have 

supsup/(x,y) = {2TTy^y/^supdet~^/'^{Sg^Sg) < oo , 

so that (Al2|)-(ji|) and ^ are satisfied. 

We finally check (Al3]). For any n > r, and x G X the function 9 
Pxo{yo~^) is given by ([16]). Under (Ll3]), the functions 9 i-)' Og^n (where Og^n 
is the observability matrix defined in (fH|) ) and 9 i-^ det~^/^(Je „) (where 
Tg^n is the covariance matrix defined in (jlSh ). are continuous on for any 
n > r. Thus, for any x € X, i— > PxiVo"^) is continuous for every n > r, 
showing (Al3]). 

To conclude this discussion, we need to specify more explicitly the set 
Ai{D,r) (see ([8])) of possible initial distributions. Using Proposition [3l we 
have to check the sufficient conditions (jlOp and (jlip . To check (jlOp . we use 
Lemma m Note that , for any open subset 0, 

Q^(x,0) =E[lo{Aex + RgU)] 

where the expectation is taken with respect to the standard normal random 
variable U. Let {{xn,9n)}^=i be a sequence of random variables converg- 
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ing to {x,9). By the Fatou Lemma, using that function Iq is lower semi- 
continuous and that , under (L[3|l 9 Ag is continuous, we have 



hminf Q^"{xn, 0) > E fuminf lo{Ag^x + Re„U) 

\immflo{Ae„x + Re„U) =Q'{x,0), 



n— >oo 

> E 



showing that, for any open subset 0, the function (x, 9) i— ?• Q^{x, 0) is lower 
semi-continuous. 

Assumption (L[2I) implies that, for all (x,y) S X x Y, 

lng'{x,y) > -^ln(2^) - ^ inf lndet-'/'{Se'Se) 
2 2 eeo 



inf Ar, 



[Sg^Sg) 



-|- sup 

eee 



where Xmin{Sg^Sg) is the minimal eigenvalue of Sg^Sg. Therefore, under (L|4])) 
(fTT]l is satisfied because is a compact set, u £ {0, . . . ,r}. 

We can therefore apply Theorem [2] to show that the MLE is consistent for 
any initial measure x &s soon as the process {Yfcjfcgz is stationary ergodic 
and E [\Yo\'^] < oo. 

4.2. Finite state models. One of the most widely used classes of HMM 
is obtained when the state-space is finite, i.e. X = {1, . . . , d} for some inte- 
ger d, let Y be any Polish space, and let be a compact metric space. For 
each parameter 9 £ @, the transition kernel is determined by the corre- 
sponding transition probability matrix Qg, while the observation density 
is given as in the general setting of this paper. 

It is assumed in the sequel that: 

(Fl) There exists an integer r > 0, such that, inf^ge iiif(x,x')GXxX Qgi^, x') > 
0. 

(F2) There exists a set M C Y such that infg^Q inf^gM ^^^^xex 9^ {x, y) > . 
and supegesupygM sup^gx y) < oo . 

(F3) For any G 0, the function ^(^ : (x, y) G X x Y i— ?> g^{x,y) is positive 
and 

E 



In sup sup (7 {x,Yq) 



< oo . 



(F4) E [ln-inf^,g0inf,gx/(x,yo)] < oo . 

(F5) 9 1-^ Qg and 9 i-^ ge{x, y) are continuous for any x € X, y G Y. 
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Consider first (All]). We set C = X. Since = 0, ([5]) is trivially satisfied. 
Under (fH]), Eq. (g]) is satisfied with ipx{yo~'^) {x) = 1, = d'^^^Li^i' 
and 

d-l 

^y\Vn~^] = dTT inf inf q^(x,yi) x inf inf QUx,x') 
1=0 

d~i 

4bo""^] = dWsnvsnvg'^{x,yi) x sup sup Ql{x,x') . 

~^e&ex&. 6»G0 {a:,a:')eXxX 

Hence, the state space X is a r-local Doeblin set. Assumption (El2|) implies 
that ^ is satisfied with K = M''. Now, note that for all u € {1, . . . , r} and 

u-1 

(18) inf inf if iy^'^) > TT inf inf g\x,yi) , 

Using the previous inequality with u = r and noting that (E|4|) implies 
that E [ln~ infgge iiifxGX ^^(ic, l^o)] < oo show that Eq. ([7]) is satisfied with 
D = X. The same argument for any u G 1, . . . , r shows that all the probability 
measures on (X, X) belong to the set A4(X,r), defined in ([8]). 

Assumption (Al2]) is a direct consequence of (El3|). Finally, we note that the 
continuity of 9 Q0 and 6 1-^ gg{x,y) yield immediately that 9 i-^- p^xiVa) 
is a continuous function for every n > and y^ G Y"+i, estabhshing (Al3]). 

We can therefore apply Theorem [2] under (HT])-(n5|l to show that the 
MLE is consistent for any initial measure x soon as the process {Yk}k£Z 
is stationary ergodic. 

4.3. Nonlinear state space models. In this section, we consider a class of 
nonlinear state space models. Let X = M"*, Y = and X and y be the 
associated Borel <T-fields. Let be a compact metric space. For each G 
and each x G X, the Markov kernel Qe{x, •) has a density qe{x, •) with respect 
to the Lebesgue measure on X. 

For example {Xk)k>Q may be defined through the nonlinear recursion 

Xk = Te{Xk^i) + T,o{Xk-i) Ck ■ 

where (Cfc)fc>i is an i.i.d. sequence of d-dimensional random vectors which 
are assumed to possess a density p( with respect to the Lebesgue measure 

: M R'^^" are given (measurable) 
matrix- valued functions such that for each G and x G X, Tig{x) is full- 
rank. Such model for {Xk)k>o is sometimes known as a vector ARCH model, 
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and covers many models of interest in time series analysis and financial 
econometrics. We let the reference measure /i be the Lebesgue measure on 
M^, and define the observed process {Yk)k>o by means of a given observation 
density g^{x, y). 

For any positive matrix B, denote by Amin(-B) its minimal eigenvalue. 
Here || • || is any matrix norm (it is elementary that p{A) does not depend 
on the choice of the norm) . We now introduce the basic assumptions of this 
section. 

(NLl) The function {x,x',6) i— > q^{x,x') is a positive continuous function on 

X X X X 0. In addition, sup^gQ sup^^ ,j./-)gxxx < °o. 

(NL2) For any compact subset K C Y, and € 0, 

hm sup J— — - = . 

IxKoo ygK sup^/gx 9%x',y) 

(NL3) For each {x,y) G X ^ Y, the function 9 i— ?• g^{x,y) is positive and 
continuous on 0. 



ln+ sup sup g^{x, Yq] 



< oo 



E 

(NL4) There exists a compact subset D C Y such that 



E 



In inf inf q (x, Yr,) 



< oo 



We have made no attempt at generality here: for sake of example, we have 
chosen a set of conditions under which the assumptions of Theorem [2] are 
easily verified. Of course, the applicability of Theorem [2] extends far beyond 
the simple assumptions imposed in this section. 

Remark 9. Nonetheless, the present assumptions already cover a broad 
class of n onLinear models. Consider, for example, the stochastic volatility 
model J la] 



(19) 



= /3eexp(Xfc/2)efc , 



■where (Cfcj^fc) ^'"e i.i.d. Gaussian random variables in with zero mean 
and identity covariance matrix, Pg > 0, ag > for every S 0, and the 
functions 9 ^ (f)g, 9 ^ ag, and 9 (3g are continuous. Then, Assumptions 
(Nim)-(NI^ are satisfied as noted by 0, Remark 10]. 
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Under (NlH]), every compact set C C X = M'' with A^''''(C) > is a 
1-small set and therefore a local Doeblin with A^(-) = A^^'^(- n C)/A^^'^(C), 
/c(2/o) = AL^b(C) and 

er = inf inf q^(x,x') , 
^ eee {x,x')ecxc 

= sup sup q^{x,x') . 

6»ge {x,x')£CxC 

Under (NL[l]) and (NLl2]), ([5]) and dS]) are satisfied with r = 1; Eq. d?]) 
follows from (NUT]) and (NLg]). Thus assumption (A[T]) holds. 

Assumption (Al2]) follows directly from (NIJS]). To establish (Al3]), it suf- 
fices to note that, under (NL[T]), for any {x,x') € X x X, i— > q^(x,x') is 
continuous, under (NL[3|), for any (x, y) € X x Y, i— ?> g^{x,y) is contin- 
uous, and for any n € N, supggQ sup^^x 0^=0 ■^'s) ^ I'^'^-S- • The 
bounded convergence Theorem shows that, P-a.s.the function 9 i— ?> p^(i^") 
is continuous. 

Finally, under (NIJT]l-(NL|4|) according to Theorem [2] and Proposition [3] 
the MLE is consistent for any initial measure x such that > 0. 



5. Proof of Proposition [T] and Theorem [2l 



5.1. Block decomposition. The first step of the proof consists in splitting 
the observations into blocks of size r where r is defined in (A[T]). More 
precisely, we will first show the equivalent of Proposition [1] and Theorem [2] 
with Yi replaced by Zi = Y-^^~^^^^ ^. With this notation, 

e^^nr = argmaxg£elnp^(yo"''''"^) = argmaxg^Q lnp^(Zo . 

In the following, Oy^^nr is called the block Maximum Likelihood Estimator (de- 
noted hereafter as the block MLE) associated to the observations Zq, . . . , Zn-i 

5.1.1. Forgetting of the initial distribution for the block conditional likeli- 
hood. Denote, for i € Z, 

(20) = yt'^^"' G , 
the likelihood p^{zq~^) may be rewritten as 

(21) p^(zri) = p^(C"') = xL^(^o) . . . L^(^n-i)lx = xL^(^o""')lx , 
where L^(zo"^) = L^(yo''"^) is defined in ^. 
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For any sequence {^i}j>o £ where Z = Y^, any probability measures 
X and x' on (X, X) and any measurable nonnegative functions / and h from 
X to R+, define 
(22) 



A^,^,«-0(/,/i) = (xL^(zo""0/)(xV(zrO/i)-(xL^(zo"-0/.)(x'L''(zr^)/) • 

Let = X ® X. For P a (possibly unnormalized) kernel on (X, ^), we 
denote by P the transition kernel on (X, X) defined, for any (x, x') £ X and 
A, A' G X, by 



(23) 



P[{x,x'),A X A'] = P{x,A)P{x',A') . 



If X and x' are two probability measures on (X, X), define x®x' the measure 
on {X,X), given, for all A G Xhyx'^x'i^) = fjxi'ix)x'idx')lf^{x,x'). With 
the notations introduced above, (1221) can be rewritten as follows 



(24) Al^,{zS-'){f,h)= I ■■■ I x^x'idw',) 



L^{zi){wi,dwi+i) {f ® h - h ® f} {w„ 



\i=0 / 

where for w = {w,w') G X, / (g) h{w) = f{w)g{w'). 

The following proposition extends Proposition 12]. 

Proposition 5. Assume (J^. Let < 7" < 7+ < 1. Then, for any 
T] > 0, there exists p G (0, 1) such that, for any sequence {zi)i>Q G 
satisfying 



n-l 



(25) 



n 



lKiz^) > max (1 - 7", (1 + 7+)/2) 



i=0 



for any /3 G (7 ,7"^), any nonnegative bounded functions f and h, any 
probability measures x o-nd x! on (X, X) and any 6 € Q, 



Al^,{z^-'){f,h) 



< 



plnW-,-)i {(;^L^(^-i)/)(yL^(,-i)5) + ix'L'{z^~')f){xL'{z^~')g)} 



+ 2r] 



Ln(7+-/3)J/2 



n-l 



1=0 



I/loo Noo 
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Proof. Let rj > 0. According to (A[T]), there exists a set C C Y such 
that ^ and ^ hold. Denote C = C x C and for z = yQ~^ , set (p^q{z) = 
ip'l{z)(^ip1{z) and A^(z) = A^(z) (g) A^(z) where 93^ (z) and A^(2:) are defined 
in Definition [TJ For any measurable nonnegative function / on (X, X), 9 ^ Q 
and x S C, 
(26) 

{el{z)f^i{z){x)\i{z){l-J) < 5,t\z){l-J) < (4(z))V'c(^)(S)A^(^)(lc/) 

Define the unnormalised kernel L ' {z) and L ' (z) on (X, X) as follows: for 
ah X G X and A e X, 

(27) L'''{z)ix,A) ^ lc{x)ieZiz)f^'c{^)ix)-Xl{z){CnA) , 

(28) L^'\z){x,A)^l\z){x,A)-L^'\z){x,A) . 

Eq. (|26p implies that, for all x € C, and any measurable nonnegative function 
/, 

< S,L''\z)ilJ) < rciz)5,L\z)ilJ) 
where rc{z) = 1 — (e^(z)/e^(z))^. It then follows 

(29) 

= l-c{x)ds,t''\z){lj) + lc(x)JsL^'\z)(lcc/) + lc.{x)6^L'^'\z){f) 
< rc{z)l-c{x)6,L'{z){lJ) + lc{x)6,L'{z){lccf) + 1cc(x)5^l'(z)(/) 
<6^L\z){rc{zfcmcf) . 

Note that A^^,{zQ~^){f,h) may be decomposed as 

AL'(^o"')(/, h)= Yl a3"(zo"-1)(/, h) , 

where 

A'f "(zo""')(/>^) = /•••/ xm'idw'o) ^l[L'''^z,){wi,dm+i)^ Hwr.) , 

with $ = f (^h — h(^ f . First assume that there exists an index i S {0, . . . , n — 
1} such that t j = then, 

4,?"(^o"')(/. /i) = X ^ X' (l''*"(^o) . . . (ic X /c(^0)) 

X (6e(^0)'A^(^.) (lcL''*^+^(z.+i)...L''*"-^(z„_i)a>) . 
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By symmetry, 



6 1 1 

showing that A ' (zJ? "^)(/, /i) = except if for all i G {0, ... ,n - 1}, 
= 1. Therefore, 

It implies, using (|29l) that 

(30) |a^,^,(zo"-1)(/,/i)| <x^x'(L'''(--o)...L'''(^n-i)|«>|) 



Note that 

n-l 
i=0 



(31) n (rc(^i))'=>^=^""""+^^ < 



where — sup^g^^ rc(2;) < 1 under (A[T]). For any sequence Zq~^ such 
that n~^J27=o lK(-2i) > (1 — 7~), we have J27=o ^K'^i^i) ^ "-T"! so that 
Z]r=o^ iK'^(-Zj) ^ L^7~J- Moreover, we have 



yi=0 

n-l 
i=0 



(32) ^lcxc(«^i>^m)lK( 



n-l 



n—1 n—1 

i=0 i=0 
n-l 



j=0 



> ^^C,n(%") - L^T-J , 



where, for any set A G X, N^j^{wq) = X^^Jq^ l^^^(u)j, -iDj+i). By combining 
pip and p2p and using that [n/3J — [n7~J > [n(/3 — 7~)J, we therefore 
obtain, for any /3 € (7", 1], 

(33) n (rc(z.))^^xc(^.-'^.+i) < gW-^-)\ + 1 {iVe^JO < MJ} • 
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For any sequence Wq~^ € X" and any A G X, denote 

n-l 
i=0 

Using Lemma 17], for any sequence Wq satisfying A''^ n(^o ) < b^f^l which 
is equivalent to iVc^„(w)o) < L^/3J - 1, we have Mc^„('u)o"^) < ([n/3J + n)/2, 
so that 

n~u ^ „ An - [n/3\ 



(34) A^c,n(^o) < MJ Mc.,„(u)o"-^) > a.n 

In words, either the number of consecutive visits to the set C is larger than 
[n/3J, or the number of visits to the complementary of the set C is larger 
than a„. Plugging p4p into ()33p and combining it with (j30p yields 

Al^,{z^){f, h)\ < Q^^^^-^'^^x^ X' (l'(^o) . . . L'(^n-l)|f l) 

+ 2 I/loo |/^loorL'(^o"')> 

where 

n-l 



j=0 

We finally have to bound this last term. First rewrite F^ -^'{^q'^) as follows 



X 



.1=0 



Note that ([25]) implies that X^^^g^ iK('Zi) ^ + L"'7^J)/2- Then, for any 
7^ > /3, the inequality Mqc ni'^o"^) ^ imply that 

"f ic.(*)iKfe) > E ic.fe)-E iK.(..) > > WrlzM 

i=0 i=0 j=0 

showing that 
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The proof follows noting that, for any w = {w, w') G X and z G V, ([3]) and 
([5]) implies 

if{z){wAm+i) _ L^(z)(y;,X)L^(z)K,X) 



j^1cc(«))1kW |L^(^)(.,X)f r^iMw)i^{z) |L^(2)(.,X)| 



□ 



Lemma 6. Let {Uk)k€Z, iVk)kez, (Wkjk&z be stationary sequences such 
that 

E [ln+ ;7o] < oo, E [ln+ Vo] < oo, E [ln+ Wo] < oo . 

Then, for all rj, p in (0, 1) such that — In > E [in"*" Vq] , there exists a F—a.s. 
finite random variable D and a constant g G (0, 1) such that for all k > 
1, m > 0, 

pk+m ^ ( Y[ v)j Uk < q''+"'D , F-a.s. 

\i=—m / 

Proof. Let a G (0,1) such that E [in"*" Vq] < -In a < -Inry and let 
a > such that (rj/a) V p < a < 1, then 




with 




We now show that D is ¥ — a.s. finite. First note that combining the 
bound E [in^ Uq < oo] with Lemma [7] (stated and proved below), we ob- 
tain that the random variable supj.>j^ a'^L^fc is P — a.s. finite; in the same 
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way, sup^>o Q;™'VF_m is P — a.s. finite. Moreover, since E [in"^ Vq] < csOj 
Birkoff 's ergodic theorem ensures that 



^ fc-i 

^— J2 ^fc^oo E [ln+ Vo]<-lna, P-a.s. 



i=l 

By taking the exponential function in the previous hmit, we obtain that 
J](yia) <exp|(A:-l) ^^-1- ^ ln+ + In | ^fc^oo , P-a.s. 

so that sup;j>i nti^(^*'^) is IP — cl.s. finite. Fohowing the same arguments, 
sup^>o ni'=-m(^*'^) is P — CL.s. finite. Finally L> is P — a.s. finite. The proof 
is completed. □ 

Lemma 7. Let {.^^Ifcez be a sequence of non-negative random variables 
on a probability space {Q,A,W) such that for any /c € Z and any measurable 
nan negative function f , E [/(Z^)] = E [/(Zq)]. 

(i) Assume that E [(In Zq)^] < oo. Then, for all (5 € (0, 1), sup;.>Q fi'^Zk < 
oo, ¥ — a.s. 

(a) Assume that E [| InZoj] < oo. Then, for all f3 G (0, 1), sup^^g^ /Sl'^'Z^ < 
oo and inf^g^ > 0, P — a.s. 



(3^Zk > 1 



Proof. Let /3 G (0,1). Since 

P[lnZfe/(-ln/3) > k] = P[lnZo/(-ln/3) > k] , 
it follows that 

oo oo 

Y,^[(3''Zk > l] = ^ P[lnZo/(-ln/3) > A;] < E [(lnZo)+] /(-ln/3) < oo . 

A:=0 k=0 

The proof of (i) is completed by using the Borel Cantelli Lemma. Now, (ii) 
can be easily derived by noting that if E [| lnZo|] < oo, then one may use 
twice (i), first by replacing Z^ by and then by replacing Zi; by l/Z^. □ 

Proposition 8. Assume (A[M^. There exist a constant k G (0,1), an 
integer-valued random variable K satisfying Py < oo] = 1 such that, for 
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any initial distributions X;X' S Al(D,r) (where A^(D,r) is defined in 
(35) 

< OO , F — a.s. 



sup sup sup K 

e&e k>K m>0 



-{m+k) 



(36) 



sup sup sup K 

e^e k>K m>0 

(37) 

sup sup K~ 

0eem>o 
Proof. 



lnp^(Zfc|Z!„^)-lnp^,(Zfe|Z: 
lnp^(Z,|z!-i)-lnp^(Z,|z!-i_i 
lnp^(Zo|Zl^)-lnp^(Zo|Zr^_,) < oo , 



-{m+k) 



< QO , 



a.s. 



a.s. 



Proof of ([35]) . It follows from (|2T]) that, for any integer {m,k) G N and 



any sequence z 



Piizk\z 



.1, xL^(^™')(L^(^fc)lx) 



xL^z!;,i)(ix) 



Since, for any a,b > 0, ln(a) — ln(6) < (a — 6)/6, the definition (|22|) implies 

that 

(38) 

ln,,(..|._)-lnp,,(.,|._) < ^L^^/-)(l,) xx'L^(/"J)(L^(..)lx) ' 

Let < 7~ < 7^ < 1. By Proposition [5l for any 77 > and f3 E (7~,7'^) 
there exists g G (0, 1) such that, for any sequence z^'^ satisfying 



fc-i 



(39) 

we have 
(40) 



[m 



+ ky^ iK(^i) >niax(l- 7^,(1 +7+)/2) , 



,(/-i)(L^(z,)lx,lx) 



< 0' 



a{m+k) 



xL'^{zl-J){lx) X x'L''(z'!„0(lx) 

xL^(^-^)(lx) X x'L^(^^;.')(L^(.,)lx) 
where a{n) = [n(/3 — 7~)J, b{n) = [n(7+ — /3)J/2 and 



1 + 



+ 2r/^(™+'=)q 



(41) Cm.,k 



L^Zfc)(-,X) 



xL^(z!-^)(lx) X x'L^(^!™')(L^(z,)lx) 
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Moreover, by (122]) . 

xL^(^-^)(L^(z.)lx) X x^L^(z^-^)(lx) 



At ,(z'!-i)(L^(.,)lx,lx) 



xL^(/„^)(lx) X x'L^(z^;„i)(L^(z,)lx) 
Plugging this identity into (|40|) and then using (|38p yields 

(42) lnp^(z,|z!-^) - lnp^,(2fc|z^:;„^) < 2(1 - 
For any sequence 2;^^^, we have 

(43) xL^(z!-^)(lx)>x(D) n ' 



+ 1 . 



V(^-i)(L^(z,)lx)>x'(D) n mfL^(z,)(x,D) 



Exchanging x and x' in (|42p allows to obtain an upper bound for | In p^{zii.\z^^) 
lnp^,{zii.\z^~^)\. More precisely, for any sequence z'^'^ satisfying (j39|) . we 
have 

(44) sup lnpt(zfc|z^-^) - InpUzklzl-^) < 2(1 - ^,-(-+fc))-i 



6»ee 



X < 



,b(m+fc) 



X(D)X'(D) 



_ ^a{m+fc)-j-l 




fe-1 




n 




J=-m 





where, for z € 



(45) 



supgge |L^(^)(^X)|^ 
infeee inf^GD L^{z){x, D) 



Assume that E [ln^(Z?2,j )] < oo and set r] small enough so that E [ln'''(D2(,)] < 
— Inr;. By Lemma [6l there exists a P-a.s. finite random variable C, and a 
constant k € (0, 1) such that, for all A: > 1, m > 0, 



\ _ pa{m+k) 



b{m+k) 



X(D)X'(D) 



k-l 








n c^'f 






P-o.s 


J=-m 
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It remains to show that E [hi'^(L'^(,)] < oo. Since for any a,b > 0, ln''"(a/5) < 
hi+(a) + ln"(6), 



(46) hi+ (D,) < ln+ (sup L^(z)(-,X) ) + In" ( inf inf L^(z)(x, D 



Since, for any z = e Y^sup^ge |L {z){-,X)\^ < sup^ge 19 (-lyi) 

(AfT])-liiiland (Al2]) imply that E [ln+(L'^(,)] < oo. Finally, according to 



sup 



a.s. 



a.s. 



lnpl{Zk\Z'_-J) -lnpl,{Zk\Zl-J) 
provided that 

k-l 

(47) (m + ky' 1k(^,) > max(l - 7", (1 + 7+)/2) , 

j=-m 

It thus remains to show the existence of a P — a.s. finite random variable K 
such that for any k > K and any m > 0, (I47p holds P-a.s. Under (AlTJ-|il 
1 - P [Zo G K] < 2P [Zo G K] - 1. Then, choose 7~, 7", 7+ and 7+ such that 

(48) 1 - P [Zo G K] < 7- < 7- < 7+ < 7+ < 2P [Zo G K] - 1 . 

By construction(l + 7+)/2 < Py [Zq G K] and 1 - 7" < P [Zq G K]. Since 
(Zfc)fcg2 is stationary and ergodic, the Birkhoff ergodic Theorem ensures 
that there exists a P — a.s. finite random variable B such that for any k > B 
and m > B, P-a.s., 

1 + 7^ 



(49) 
(50) 



max ^1—7 , 
max I 1 — 7~ , 



2 

1 + 7^ 



< A;-i^1k(Z,) , 

4 = 
-1 



Set = B{1 + 7+)/(7+ - 7+). If m > S and A; > K+, then using that 
K+ > B, P-a.s., 



E^=-mMZ^) ^ fc(l + 7+)/2 + m(l + 7+)/2 

k + m k + m 

Now, if < m < S and k > K+, 



(l + 7+)/2>(l + 7+)/2. 



k + m 



k + m 

A;(l + 7+)/2 E:+(1 + 7+)/2 



> 



k + m 



> 



K+ + B 



:i+7+)/2. 
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Similarly, setting K = B{1 — 7 )/(7 — 7 ), we obtain, for all m > and 
all k> K- that, P-a.s. , 



k + m 



> 1 - 7" . 



The proof of ()35p is now completed by setting K = V 
Proof of dM}- Note that 

with x'(A) = x(L^(2-m-i)lA)/x(L^(^-m-i)lx)- Since 

X'(D) x(L^(^-m-i)lD) " X(D) ' 
where is defined in (HSl) . (IHI) writes: 



sup 



lnpj(^.k^-^) - lnp^(z,|/-i_i) < 2(1 - 



a(r?i+fc) _|_ ^ 



p{m+k) 



k-1 



j=-m 



And the rest of the proof of (j36p follows the same lines as (j35p and is omitted 
for brevity. A 
Proof of (|37p . Noting that Equation (|47p when A; = follows immedi- 
ately from (|50p , the proof of (|37p follows the same lines as the proof of (|36p 
and is omitted for brevity. < 

□ 

Corollary 9 (Corollary of Proposition [8]) . Assume ('.42H^- For any 

G 0, there exist a measurable function vr^ : — )• M such that for any 
probability measure x satisfying x(D) G A4{D,r) (where A4{D,r) is defined 

in dSD;, 



(51) 



Py 



hm pUZo\ZzI) = Tr'ziZt^ 



1 . 



In the sequel, we denote p^(Zo|Z_^) = ^^^{Z^^) and for n > 0, p^(Zq 

nr=o4(^ioo). 
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5.1.2. Consistency of the block MLE. 

Proposition 10. Assume (j^^MW- Then, 
(i) For any G 0, 

(52) E[|ln/(Zo|Z 



27 



— oo I 



< oo . 



(ii) For any probability measure x € A4{D,r) (where A4{D,r) is defined in 

m), 



lim sup sup 



ln/(Z^-^) - n-Mn/(Z^-i|Z. 



n— 1 I ry—l 1 



0, F-a.s. 



(Hi) For any S 0, and for any probability measure x € A^(D, r), 



lim_n-Mnp^(Zo"-^) = E |ln/(Zo|Zl^) 



a.s. 



Proof. Proof of ^. It follows from ([H]) that, P - a.s., 
(53) 



r-l 



/(ZolZl^) = lim pI{Zo\Zz'J < L^(Zo)(-,X) < J] 5'(-,>^.) 



i=0 



Then, (Al2|) shows that 



E 



ln+p%Zo\ZZl) 



< E 



In^ 



L'(^o)(-,X) 



< oo . 



WenowshowthatE [ln~/(Zo|2'I^)] < cx) by establishing that E [lnp^(Zo|Zl^)] > 
— oo. For that purpose, introduce the sequence 



m = 



m 

1 V [ln+ L^(Zo)(-,X) -lnpJ(Zo|Z-^) 

~ — L oo 



By ([53|) . the sequence (L^)m>o is nonnegative and the Fatou lemma implies 
that. 



(54) 

By definition, 
(55) liminfE 



lim inf E 



> E 



lim inf Lt, 



E 



ln+ 



L'^(^o)(-,X) 

- limsupm"^ ^E [lnp^(Zo|Zl|) 



£=1 
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and 



(56) E 



lim inf 

. rn—¥oo 



E 



In^ 



L'(^o)(-,X) 



E 



lim sup ?n ^ lnp^{ZQ\Z_j) 



Since iYk)kez is stationary, for any ^ G N, E [lnp'^{Zo\Zzl)] = E \ln p'^{Zi\Z^~~^) 
showing that 

m m 

(57) m-ij;E[lnp^(Zo|Zi;)] =7n-i j;E[lnpJ(Z,|Zo^-i)" . 

The Cesaro mean convergence Lemma imphes that, P — a.s., 
(58) 

m 

limsupm-^y^lnpliZolZzj) = hm lnpl{Zo\Zzj) = lnp\Zo\ZZlo) ■ 



Combining ([55]), ([Ml), ([57]), and ([58]) yield to 
(59) E[lnp%Zo\Zzl)] 



> lim sup m 



in 



7 I V^-l^ 



m"^E 



Inp^(Zo)]} 



= lim sup |e [m^^ lnp'^^{Z^ 
where the last bound follows from (AIT]) -lull and the minoration 

m 

lnp^(Zo™) >lnx(D) + Vln inf L^(Z,)(x, D) . 

— ; x&D 

The proof of (|i|) follows. 



> — oo 



i=0 



Proof of (jii|). According to Proposition 151 -( |35p . there exists a random 
variable C satisfying Py [C < oo] = 1 such that for all A; > i^T and m > 0, 



sup 



Inp'^iZklZlz:) -Inp'^iZklZ'rJ^,: 
which implies that 



sup 

6»Ge 



lnpUZk\Z^-^) - lnp'{Zk\Z'rJ) < Ck'/{1 - k) P - a.s. . 
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The proof of (jnl) follows from the obvious decomposition 



n 



-ilnp^(Z^i) = n-i j;inp^(Zfc|4-i) + n-ilnp^(Zo) , 



k=l 



n-1 



(60) n'Hn/{ZS''\ZzL) = n-'Y.ln/{Z,\Z^~J) . 



k=0 



The proof of (jm|) follows from (|52p and (j60p using the Birkhoff Theorem 
(see for example [? , Theorem 1-14]). 

□ 

Proposition 11. Assume (J^S^M)- X a probability measure such 
that X € M{D,r) (where M{D,r) is defined in 



(i) For any € and any p > 0, 



1 



limsup sup -InpliZ^-') <E 



sup ln/(Zo|Z„^ 
0eB(eo,p) 



-a.s. 



(a) The function 6 [lnp^(Zo|Zr(^)] is upper semi- continuous. 
(Hi) For any compact set H C G, the sequence (supgg= ^ lnp^(ZQ 
converges P — a.s. and 



lim sup-lnpt(^""^) = supE \\np\Zo\Zzl,) 
Proof. Proof of Proposition [10}-([ii|) shows that 
(61) limsup sup — lnp^(ZQ~^) 



a.s. 



n^co 6»GB(6»o,p) 



n 



^ n—l 

< hm sup - sup In p^{Zi\ Z'S^ ) , 



a.s. 



By ([53]), for any 6*0 G G and p > 0, 

(62) \np'\Zo\Zzlo) < sup lnp^(Zo|ZZ^ 



eeB(9o,p) 



r-l 



< ^supln+ |5(-,Fi 



i=0 



a.s. 
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which shows using ([52]) and (A[2]) that 

E sup ln/(Zo|Zli, 

_ eeB{eo,p) 

The Birkhoff Theorem therefore imphes 



< CX3 . 



(63) hmsup — sup Inp^iZAZ 



ooJ 



E 



sup lnp^{Zo\Z_l^ 



a.s. , 



which completes the proof of (ji|) . 
Proof of du]). First note that 



(64) sup E 



< E 



sup In p^{Zo\Z_l^ 
9&Bieo,p) 



Now, since under (A[3]), for any m > p, P— a.s., the function 9 i— ?> lnp^{ZQ\Z_^^ 
is continuous, then P — a.s., the function 9 i— > lnp^{ZQ\Zzlo) is continuous 
as a uniform hmit of continuous functions. Using (I62p . 

r-l 

5^supln+|<7(.,y,)loc- sup \np\Zo\ZZl)>0 , 

eeB{eo,p) 



1=0 



the monotone convergence theorem therefore imphes that 



(65) hmE 

piO 



sup lnp^(Zo|Z„^ 

e&B{eo,p) 



E 



hm sup In p^{Zq\Z_1^) 



E 



hip^o(ZojZl^) 



Combining ()64p and ()65p shows that 



hm sup E 



Inp^(ZolZl^) <E lnp^"(Zo|Zr^) 



Proof of By taking the hmit of both sides of (ji|) with respect to 
yO J, 0, (j65]) shows that for any ^ ©i 



1 



(66) hmhmsup sup - ln/(Z^-i) < E lnp^"(ZolZr^) 

p4-0 n^oo 6»GB(6»o,p) ^ 



a.s. 
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Therefore, for any 5 > and ^ there exists pe^ > such that 



1 



hmsup sup -ln/(Z^-^) < E ln/«(Zo|ZZ^ 



5eB(9o,P9o) 



n 



a.s. 



Since S is compact, by extracting a finite covering, the latter inequahty 
shows that 



hmsupsup-lnp^(Z^-^) < sup E [ln/"(Zo|Zl^ 



+ 5, 



a.s. 



Since 6 is arbitrary, we therefore have 

(67) limsupsup - hip^(Z^~i) < sup E [lnp^o(Zo|Zl^ 

Now, since for any 9q £ E, 



eeE n ^ n ^ 

Proposition [TU]- dm]) yields 

liminf sup-lnpt(^n"^) > ^ flnp^o(Zo|Zli„ 
n-i-oo n ^ L 

6q being arbitrary in H, we finally obtain 

liminfsupilnp^(Z"-^) > sup E [ln/o(Zo|Zl3< 



a.s. 



00/1 



a.s. 



Combining this inequality with (|67p completes the proof. 



□ 



Theorem 12. Assume (^tIQH^^- Then, for any probability measure x ^ 
7W(D,r), 

lim d(^^„^,et) = 0, P-a.s. 

where C is defined by = argmaxgg@E [inp^(ZojZr^)] . 

Proof. By Proposition [TTl-dlll) the function 6 i-^> E [lnp''(ZolZl^)] is 
upper semi-continuous. Therefore the set 0^ is compact as a closed subset 
of a the compact set so that for any 5 > 0, Eg = {9 £ Q;d{9,@l) > 5} 
is also a compact set. In addition, as a upper semi-continuous function. 
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9 K [inp^ {Zo\Z__l^)^ restricted to attains its maximum which imphes 
that 



sup E 



lnp%Zo\Zzl) 



maxE 



lnp%Zo\Zzl)] <E\lnp'\Zo\Zzl) 



where 9* is any point in B^. Combining this with Proposition [T0l-( pli]) yields 



lim sup -lnp^(Zo"-i) <E ln/*(Zo|ZZ^) 



a.s. 



Using that 



lim -lnp^*(Z^-i) = E \lnp^\Zo\Zzl 



n— >oo n 



a.s. 



we finally obtain that P — a.s., 9yn G '^S finitely many times. The proof is 



completed. 



□ 



5.2. Proof of Propositionl^ and Theorem\^ We have now all the tools for 
obtaining the consistency of the MLE as a byproduct of the results obtained 
for the block MLE. We first state and prove the forgetting of the initial 
distribution for the predictive filter. 

Lemma 13. Assume (^^OP- Let < 7" < 7+ < 1. Then, for all rj > 0, 
there exists £ (0, 1) such that, for all sequence {zi)i>o satisfying 



n-l 



n 



> max (1 - 7-, (1 + 7+)/2) 



1=0 



all f3 ("f ,7"^), all measurable function f, all probability measures x o-nd 
x' and all G G, 



xL^(^o""')lx x'L^(^o""')lx 
where is defined in (|45p . 

Proof. By Proposition [5l 



<2JpW/3-7-)J +!? 



Ln(7+~/3)J/2 



X(D)X'(D) 



n-l 
.i=0 



n-l\ 



At.{z^-'){f,H) 



< 2pN/3-7-)J i/i^ + 2r/L"(^+-«J/2. 



nr=oML'(-^)(-'X)lL 



xL'^(zo""'>lx X x'I^"{z^-')tx 



n-l\ 



I/I, 
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where we have used that 



xL''(zo"-i)lx x'L'^(^o"~')lx 
The proof fohows by noting that ()43p imphes that 



V 



<l/lc 



< 



lli=0 ^Zi 



□ 



Proof of Proposition [TJ 



Proof of (ji]). Let x a probabihty measure such that x(D) > 0. The first 
step of the proof consists in using the forgetting property obtained in Lemma 
[13] to show that P — a.s., the sequence {p^{Yo\Y~^ ))e>o converges. Denote 
for any t € {1, . . . ,r}, 



i(A) 



xL'^iyZZzDlA 



xL'{yZZzl)lx 

Then, write for any m > 0, t € {1, ■ ■ ■ ,r} and any y^.^^_i G Y"^''"'"*"'"^, 

xUl^'{z-_l){g\-,y,)) 



pt{yo\y-lnr-t) =pIo {yo\z^ln) 



x'tI^'{zZl){lx) 



Let < 7 < < 1. Lemma [T3l shows that for any t G {!,..., r} and 
r] > 0, there exists p G (0, 1) such that, if 



m 



^ 1k(^^) >max(l-7-,(l + 7+)/2) 



then for all /3 e (7", 7+), and 6 e Q, 

\Pxiyo\yzln.r-t) - Pxiyo\y-mr)\ 



yo, 



<2 U 



Lm(/3-7-)J 



yoj 



j=-m 



eee 
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where 



D' 



1 



max 



t=i,...,r~iinfeeex' t(D) X(D) 



{D'__^)m>o is a stationary sequence. Using the same argument as in the proof 
of (jM]), the condition x e M{D,r) (defined in ([8])), we have E [ln+ D'_„^] < 
oo. By choosing 7"*" and 7" such that Py [Zq G K] > max(l — 7", (l + 7+)/2) 
and by applying LemmalU it fohows that there exist Qy. G (0, 1) and a P— a.s. 
finite random variable such that for any i > 1, 



\p'^{Yo\Yzl)-p'^{Yo\Yzl,)\ < C^g 



a.s. 



Similarly, for any probability measure x' such that x'i^) > 0; there exist 
gx,x' £ (Oi 1) s-iid a P — a.s. finite random variable C^^^i such that for any 
^ > 0, 

\pI{Yo\yzI) - p^,(yol>i7)l < Cx,x'4,x' ' ^ - 

This implies that for any probability measure x satisfying xC-*) > 0, the 
sequence {p^{Yo\YZi ))e>o converges P — a.s. and that the limit denoted 
by p^{Yo\YZ^) does not depend on x- Then, by stationarity of (5^)^62) we 
obtain that for all > and ^ G 0, 



lim ptiYk\Y^ 



k-l\ 
m ) 



a.s. 



which shows the first part of (jij). To complete the proof of (El), it remains to 



prove that E 
we have 



In p^{Yk\Y_ 



00 / 



< 00. Sincep^(Yfc|yV) < sxip^exd" {x,Yk), 



ln+ p'^iYklY!:^') < \n+snp g'{x,Yk) , 



which shows, under (A[2]), that 



(69) 



E 



in+/(n|y^-i) 



< 00 . 



This allows to define E 



in/(n|y_^, 



fc-lN 

00 / 



as 



E 



in/(n|y 



00 / 



E 



ln+ p%Yk\Y_ 



00 J 



E 



in-/(n|y^-i) 



so that E 



In p^{Yk\ Y^^ ) < 00 provided that we have shown E In (Y^ | Y_ 
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-oo. By stationarity of {Yk)kez 
rE 



ln/(yo|yri)] = r {e \\u+p\Yo\YI^ 



E 



(70) 



E 



E 



.fc=o 



E 



^in-/(n,|y_^^i) 



.A:=0 



.k=0 



where the last equahty follows by applying E(yl — B) = E(^) — E(i?) for 
nonnegative random variables A,B such that E(^) < oo. Now, note that 



r-l 



r-1 



r-l 



J{p\Y,\Y^^') = \{ lim pj(n|y_t^)= lim \{p{{Y,\Y^;;^^) 

it=0 fc=0 fc=0 

= lim p'{Y^-^\Yl},^) = lim p^(Zo|ZI;^) = /(ZqIZI^ 



By plugging this expression into (iTOl) and using E [| lnp^(Zo|Z_^)|] < oo 
(see Proposition nop, we finally obtain 



(71) 



rE 



\^p\Yo\Yli) \ = ¥.Wp\Z^\Zzl 



> — oo , 



which completes the proof of (ji|). < 
Proof of (jn]). Let x be a probability measure such that x(D) > and 
let t E {0, . . . , r — 1}. Then, for any m > 0, 



(72) m-^ lnp^(Z™+^) < lnp'^^{Y^''+*) + m'^ ln+ A 



< ln/(Zo"^) + ln+ B„,t + jn^^ ln+ Am,t , 



where 



-4m,t = supsuppge.^ .)(yir+m ^) ' ^m,t = supsupp5^(yX''+*) . 

Note that {Am,t)m>o and {Bm,t)m>o are stationary. Moreover, using (Al2]), 
it can be easily checked that 

E [ln+ A^^t] < oo , E [ln+ B^^t] < oo . 

Then, Lemma [7] may apply and for any f3 € (0, 1), there exist P — a.s. finite 
random variables A, B such that for all m > 0, 



a.s. 
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so that, P — a.s., 

< lim sup In^ ^m,t ^ ~ In /3 , 

m— >oo 

< lim sup m^^ In^ -Bm,t < — In /3 . 

771— >00 

By letting /3 t 1, 

(73) lim m"^ ln+ ^^^j = 0, lim ln+ B„,t = , 



a.s. 



Now, note that {Am,t)m>o and {Bm,t)m>o do not depend on G so that 
S]) together with (I72|) yields 



(74) limsupsup?n-i|lnp^(yo™'^+*) -lnp^(Z^)| = , P - a.s. 



m—>-co 9£B 



Since t is chosen arbitrary in {0, . . . , r — 1}, we finally obtain using Propo- 
sition [10]-(jiil), 

lim n-^lnpl{Yf^)=r-^ lim m-^ln/(Z^) 



\np\Zo\Zzl)] = K\lnp%Yo\Yl^ 



a.s. 



which completes the proof of Proposition [T] 

Proof of Theorem El First note that ([7T]l implies 

0* = argmaxeggE lnp^{Yo\YZ^) = argmaxgggE lnp^{Zo\ZZl^ 



□ 



e 



b ■ 



Now let t in {0, . . . , r-1} and recall that Z^ = Y^""^^. Theorem [12] together 
with ()74p shows that: 

(75) lim d(L ^r+t, &*) = 0, P - a.s. 

n— >-co 

The proof of Theorem [2] is then completed since t is arbitrary in {0, . . . , r — 
1}. □ 

Proof of the Proposition [31 Under these two conditions, for any u G 
{1, . . . ,r}, and 6* G 9, 



='0 



^ (n .i'^f, 9'(a^i,yi)) /••• / X(dxo)lD(Xn)ni[ 



_,(xj_i)Q^(xi_i,dxi) 



> m inf /(x,,,y,))x(Do)5" 
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□ 

Proof of Lemma [H The proof proceeds by induction on u € {1, . . . , r}. 
Assume that D„_i is a compact subset; we show that there exists a compact 
set D„ such that infj:^_-^gD„_i infeee Q { Du) > 6. 

Let {x,6) € D„_i x and set 6 < 6' < I. Since X = M*^ is a complete 
separable metric space and X is the associated Borel c-field, there exists a 
sequence B^'^ , Sg'^i • • • ) of open balls of radius 1 covering X. Choose N^^e 
large enough so that Q^{x,Ox^g) > 6', where O^fi = Ui<Ar^ ^ Since for 
any open set the function {x',6') Q^'{x',0) is lower semi-continuous, 
there exists a neighborhood Vx,e (for the product topology on X x 0), such 
that for all G V^^g, ^"'(2;', 0^,e) > 5. Since 0^^ g is totally bounded its 

closure, denoted K^^g, is a compact subset, which satisfies, for any {x',6') € 
Vx,g that Q^(x,K^,e) > S. 

Then, |J^^ 6»)GD„_ixe ^^.^ ^ covering of x ©. Since the set D^-i x 
is compact, we may extract a finite subcover D„_i x C U[^^ ^Xifi,- Take 
D„ = IJj=i ^xifii- As a finite union of compact sets, D„ is a compact set, 
which satisfies, for all {x,6) G x 0, Q^{x, D„) > S. This concludes the 

proof. □ 
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